random photo
#(Haley) Making Figures with Default Graphics and ggplot2
Gandrud outlines several useful chunk options for figures in R Markdown documents: - fig.path: This tells knitr where to save your figures by specifying a file path. - out.height and out.width: These set your figure’s output height and width. R Markdown uses pixels (ex. To set height to 200 pixels, use out.height=‘200px’). If you are using LaTeX, you can also specify the height or width using cm, in, or as a proportion of a page element (ex. To set width to 80% of the text width, use outwidth=‘0.8\textwidth’). - fig.align: This sets the alignment of your figure to left, center, or right. - fig.cap and fig.lb: These set your figure’s LaTeX caption and label.
Because we already have some experience with Base R’s plotting functions, this section serves as a little refresh on how to do some basic plotting before we get into “fancy” plotting with ggplot2. Every time you create a new R Markdown document, you probably have noticed the inclusion of the pressure plot code chunk. That is exactly how to plot in Base R!
#taking a look at what the pressure data is
summary(pressure)
## temperature pressure
## Min. : 0 Min. : 0.0002
## 1st Qu.: 90 1st Qu.: 0.1800
## Median :180 Median : 8.8000
## Mean :180 Mean :124.3367
## 3rd Qu.:270 3rd Qu.:126.5000
## Max. :360 Max. :806.0000
#plotting the pressure data
plot(pressure)
Let’s try this again with some very unscientific fake data on different dogs’ fetch abilities and “goodness” ratings. For now, we will focus on their fetch abilities. Let’s start by loading the data file and checking it out.
#load and view data
GoodDogs <- read_csv("GoodDogs.csv")
##
## ── Column specification ──────────────────────────────────────────────────────────────────────────────────────────────
## cols(
## Dog = col_double(),
## Breed = col_character(),
## Age = col_double(),
## Fetch_Time = col_double(),
## Fetch_Distance = col_double(),
## Rating = col_double(),
## Good = col_double()
## )
View(GoodDogs)
Now that we can see we have data on both fetch time and fetch distance, let’s make a scatterplot, assuming that time is dependent on the distance the ball is thrown (and not on how fast a dog is or its focus on the task at hand).
plot(GoodDogs$Fetch_Distance, GoodDogs$Fetch_Time)
Now that we have the basic scatterplot down, let’s clean up our axis labels and add a title. We can do this by specifying the xlab, ylab, and main arguments in the plot() function.
plot(GoodDogs$Fetch_Distance, GoodDogs$Fetch_Time,
xlab = "Distance Ball is Thrown (ft)",
ylab = "Fetch Time (s)",
main = "Throw Distance vs Time while Playing Fetch")
‘ggplot2’ allows for additional flexibility and customization of our figures. Using the “grammar of graphics”, ‘ggplot2’ is able to build every graph from the same base components: - a data set (e.g., your data) - a set of geoms (e.g., the points representing your data + how you want them to look) - a coordinate system (e.g., defining where the data points are)
The grammar for this package can feel a little unwieldy at first, so keeping this cheatsheet [link] (https://rstudio.com/wp-content/uploads/2015/03/ggplot2-cheatsheet.pdf) handy might be helpful!
Let’s start to get to know ggplot better by revisiting our fetch data. First, we need to specify our data using ggplot(). Next, we specify that we want a scatterplot using geom_point().
#make a basic scatterplot
ggplot(GoodDogs, aes(x=GoodDogs$Fetch_Distance, y=GoodDogs$Fetch_Time)) +
geom_point()
Now let’s make our figure look a little bit nicer by cleaning up our axis labels.
#add axis labels and a title
ggplot(GoodDogs, aes(x=GoodDogs$Fetch_Distance, y=GoodDogs$Fetch_Time)) +
geom_point() +
xlab("Distance Ball is Thrown (ft)") + #rename the x axis label
ylab("Fetch Time (s)") + #rename the y axis label
ggtitle("Throw Distance vs Time while Playing Fetch") + #add a plot title
theme(plot.title = element_text(hjust = 0.5)) #this centers our title
We also might want to know some more about our data, like which points are associated with the different dog breeds in our dataset. This will use our same code from above, but a color argument to the aes argument of ggplot.
#add color based on dog breed
ggplot(GoodDogs, aes(x=GoodDogs$Fetch_Distance, y=GoodDogs$Fetch_Time, color=GoodDogs$Breed))+
geom_point() +
xlab("Distance Ball is Thrown (ft)") +
ylab("Fetch Time (s)") +
ggtitle("Throw Distance vs Time while Playing Fetch") +
theme(plot.title = element_text(hjust = 0.5))
Finally, let’s refine our legend and title styling a bit more. We can also streamline our labels code using the labs function. Let’s also change our plot theme to remove the gridlines and gray background.
ggplot(GoodDogs, aes(x=GoodDogs$Fetch_Distance, y=GoodDogs$Fetch_Time, color=GoodDogs$Breed))+
geom_point() +
labs(color = "Dog Breeds", x="Distance Ball is Thrown (ft)", y="Fetch Time (s)") + #combine our label functions and add in an argument for the legend title
ggtitle("Throw Distance vs Time while Playing Fetch") +
theme(plot.title = element_text(hjust = 0.5, face = "bold"), legend.title = element_text(face = "bold"), legend.title.align = 0.5, panel.background = element_blank(), panel.border = element_blank(), panel.grid.major = element_blank(),
panel.grid.minor = element_blank(), axis.line = element_line(colour = "black")) #specify that the title should be in bold (face), format the legend title (legend.title and legend.title.align), and change the background color and gridlines (panel arguments)
At its core, a network is simply a set of vertices connected by a set of edges. There are many kinds of networks, and network analyses can be used across disciplines.vFor instance, networks of scientificvcollaboration, a food web of marine animals, and American college football games are all covered in a paper on community detection in networks by Girvan and Newman (2002). Additionally, Buldyrev et al. (2010) study node failure in interdependent networks like power grids. Social networks such as links between television and film actors found on http://www.imdb.com/ and neural networks, like the completely mapped neural network of the C. elegans worm are also extensively studied (Watts and Strogatz, 1998).
In network analyses, “nodes” designate the vertices of a network whereas “edges” indicate the ties between nodes. The edges in networks can be directed, indicating an ordering of vertices (wherein switching the direction of hte edge would change the structure of the network), or the edges can be undirected, meaning the edges are simply connections between vertices where order does not matter. - The World Wide Web is an example of directed edges: hyperlinnks connect one Web apge to another, but not necessraily the other way around! - Co-authorship networks are eamples of undirected networks, wherein nodes are authors and are connected by an edge if they have written a publication together (direction does not matter!)
To continue our understanding of ggplot2, we will be using the ggnet2 function, which offers a larger range of network visualization in a single function call. ggnet2 plots network objects as ggplot2 objects that can be styled using ggplot2 scales and themes. While the ggnet2 function uses a syntax that may be familiar to those who have worked with ggplot2, it is also designed to be easily understood by users who may not be familiar with ggplot2 obkects. Thus, while ggnet2 applies the “grammar of graphics” to network objects, the funciton itself works very much like the plotting functions of the igraph and/or network packages in that a long series of arguments is used to control p[retty much every aspect of the network visualization. ggnet2 is available through the GGally package.
For the purpose of this tutorial, we are going to create an undirected basic network, with 10 nodes named “a, b, …, i, j” and a high likelihood of an edge to eist between them:
# random graph
net = rgraph(10, mode = "graph", tprob = 0.5)
net = network(net, directed = FALSE)
#vertex names
network.vertex.names(net) = letters[1:10]
#visualize the network
ggnet2(net)
The net argument is the only compulsory argument of ggnet2. It can be a network object or any object that can be coerced to that class through its edgeset.constructors functions, such as adjacency matrixes, incidence matrixes and edge lists.
The most basic properties of our network that we might want to change are the size/color of the nodes and/or the size/color of the edges. Let’s practice modifying each of these properties…
# editing node and edge properties
ggnet2 (net, node.size = 6, node.color = "black", edge.size = 1, edge.color = "grey")
Note that the vertex-related arguments of ggnet2 start with node, and the edge-related arguments start with edge. We can also abbreviate the node.color and node.size arguments to save time!
ggnet2(net, size = 6, color = "black", edge.size = 1, edge.color = "grey")
Using these basic methods, we can set the color, size, shape, and even transparency of the nodes. Let’s practice! Using the code chunks above as an example, modify your network so that it contains:
In addition to the attributes modified above, we can also modify the POSITION of our nodes. By default, ggnet2 places nodes using something called the Fruchter-Reingold force-directed algorithm. However, there are other algorithms we might want to use instead. There is no single, “good” layout algorithm, and different approaches may be valuable under different circumstances.For more information, you can see the documentation of the gplot.layout function for the lsit of placement algorithms. Let’s test out a few different common algorithms. How do these networks differ from one another?
# algorithms of node placement
ggnet2(net)
ggnet2(net, mode = "circle")
ggnet2(net, mode = "kamadakawai")
ggnet2(net, mode = "random")
As noted, the default is Fruchterman-Reingold. This function generates a layout using a variant of Frucherterman and Reingold’s force-directed placement algorithm. The circle algorithm places vertices uniformly in a circle and can’t be modified by any additional arguments. The kamadakawai funciton generates a verte layout using a version of the Kamada-Kawi force-directed placement algorithm. As one might epect, the random funciton places vertices randomly - you can re-run this line of code repeatedly and see all the different node arrangements that are randomly generated!
Open up the help documentation for the gplot.layout function and look at the list of possible layouts. Choose one we haven’t looked at yet and edit your network code to reflect a different mode. How did your function alter the arrangement of the network from the default settings?
We have already considered how to do a basic modification of node colors. Let’s now assign a vertex attribute “phono”, which indicates whether the name of the vertex is a vowel or consonant. This attribute can be passed to ggnet2 to indicate that the nods belong to a group. We will pass the name of the verte attribute to the color argument, which will then use it to map the colors of the nodes.
# indicating vowel or consonant
net %v% "phono" = ifelse(letters[1:10] %in% c("a", "e", "i"), "vowel", "consonant")
# map node color based on vertex attribute
ggnet2(net, color = "phono")
By default, ggnet2 assigns a grayscale color to each group, but we can modify this behavior! There are different options to modify the color assignment. Let’s try out a few options! One method consists of “hard-coding” the colors into the graph by assigning them to a vertex attribute, and then passing this attribute to ggnet2:
# hard-coding the color assignments
net %v% "color" = ifelse(net %v% "phono" == "vowel", "steelblue", "tomato")
ggnet2(net, color = "color")
We could also create a named vector consisting of a color legend through the palette argument, or generate a color vector “on the fly” directly in the function call (a more condensed version of the first option):
#color legend as a named vector using the palette argument
ggnet2(net, color = "phono", palette = c("vowel" = "steelblue", "consonant" = "tomato"))
# generate color vector on the fly
ggnet2(net, color = ifelse(net %v% "phono" == "vowel", "steelblue", "tomato"))
Lastly, we can also use pre-defined color palettes using the RColorBrewer package. Palette ferers to the name of any ColorBrewer palette, so ggnet2 will use this argument to color the nodes. If it returns an error message, there may not be enough colors in the package to encompass all node types.
# using pre-defined color palettes
ggnet2(net, color = "phono", palette = "Set2")
Now let’s start to think about the size of our nodes! In network analyses, it is common to size the nodes by their centrality or some other element of interst. Just like the color argument, the size argument of ggnet2 can take a single numeric value, a vector of values, or a vertex attribute:
# changing node size with a vertex attribute
ggnet2(net, size = "phono")
Just like how we could use palettes to change the color of the nodes, we can also use the argument size.palette to create nodes of different sizes that are more easy to distinguish visually:
# using size.palette
ggnet2(net, size = "phono", size.palette = c("vowel" = 10, "consonant" = 1))
We can also modify the nodes so that their size corresponds to their centrality, or number of connections, within the network. We can define two separate measures of degree centrality: indegree, whic his the count of the number of ties directed to the node, and outdegree, which is the number of ties the node directs to others.
When ties are associated to some positive aspects such as friendship or collaboration, indegree is often interpreted as a form of popularity, and outdegree as gregariousness. ggnet2 also recognizes total (or Freeman) degree, which can also be thought of as “betweenness” or a node acting as a bridge along the shortest path betwen two other nodes. In addition to “indegree”, “outdegree”, and “freeman”, ggnet also understands the argument “degree” which is equivalent to freeman.
ggnet2(net, size = "degree")
Change your network to reflect either “indegree” or “outdegree.” Did it make a noticeable difference in your network visualization? Why or why not?
You may have already realized that circles are the default node shape for ggnet2, but they are not the only option! We can also modify the shape and transpacery of the nodes in the same manner that we modified the color and size of the nods, either through a single value, a vector of values, or a vertex attribute!
# changing shape using a single value
ggnet2(net, color = "phono", shape = 15)
# changing shape using a vertex attribute
ggnet2(net, color = "phono", shape = "phono")
## Warning: Duplicated override.aes is ignored.
Note: the second example above will return a warning about a duplicated plotting parameter. This is an innocuous warning that is produced by mapping two characteristics of the nodes to the same vertex attribute. It cannot be avoided without modifying ggplot2.
Again, just like the color and size, we can use the alpha and shape arguments to take manual “palettes” of values through the alpha.palette and shape.palette arguments.
# using palettes to change transparency of nodes
ggnet2(net, alpha = "phono", alpha.palette = c("vowel" = 0.2, "consonant" = 1))
# using palettes to change the shape of the nodes
ggnet2(net, shape = "phono", shape.palette = c("vowel" = 19, "consonant" = 15))
Recall that we used “palettes” to specify which colors we wanted our consonant and vowel nodes to display as. Can we use the palette argument to do the same thing with shapes (e.g. manually specify which shape we want consonants vs. vowels to appear as)? Let’s try it out!
When it comes to making these customizations, it is important to consider what you are trying to communicate with your network. ggnet2 is pretty fleible with changing node shapes and transparency, which can make it easy to go overboard. Try and make the minimal amount of modifications that communicate what is important in your network - node shapes become difficult to distinguish if you use more than six different shapes in the plots, and transparencies may not be as easily distinguishable by the reader.
# example of overly modified node shapes
ggnet2(net, shape = sample(1:10))
#example of nodes of different transparencies
ggnet2(net, alpha = "phono")
Through the label argument, we can also use ggnet2 to label the nodes of a network using vertex names, another vertex attribute, or any other vector of labels:
# labeling using vertex names
ggnet2(net, label = TRUE)
# labeling using vertex attribute
ggnet2(net, label = "phono")
#labeling using vector of labels
ggnet2(net, label = 1:10)
We can also choose WHICH nodes we want to label. Recall that this network is based on a string of letters, so we can choose to label nodes based on if they are a consonant or a vowel:
# labeling only vowels using a vector of values
ggnet2(net, label = c("a", "e", "i"), color = "phono", label.color = "black")
ggnet2 automatically sets the size of the labels to be half that of the node size, but we can also control the size of the label using the label.size argument, their color using the label.color argument, and their level of transparency using the label.alpha argumment:
# changing label size
ggnet2(net, size = 12, label = TRUE, label.size = 5)
#changing label color
ggnet2(net, size = 12, label = TRUE, color = "black", label.color = "white")
# changing label transparency
ggnet2(net, label = TRUE, label.alpha = 0.75)
Using the code above as an example, modify your network so that the labels are 1/4 as big as the node size, the nodes are tomato-colored with steel blue labels, and 50% transparency!
There are LOT more things you can do with networks than there is time to go over in this tutorial. Some examples of other ways we could have modified our network using ggnet2 include… - altering the node legends using the alpha.legend, color.legend, shape.legend, and size.legend arguments - changing the line type of the edges - adding arrows to our edges to indicate directionality - coloring edges based on the attributes of connected nodes - removing nodes based on missing values
Furthermore, ggnet2 is just ONE package we can use to visualize networks. Other common packages to visualize networks include igraph and networkD3. - igraph is useful for building a network diagram from adjacency matrix, edge list, literal list of connections, and more. - networkD3 allows users to build interactive network diagrams with R, including zoom, hover nodes, reorganize the layout. This package will provide features for dynamic data manipulation and visualization and allows users to become active participants in data visualization process by allowing users to explore data points, hierarchies among the data, filter data by groups, and more
#(Paige) Exercise 3: Introduction to phylogenies
Citations of all R packages used to generate this report.
library("knitcitations")
cleanbib()
options("citation_format" = "pandoc")
read.bibtex(file = "packages.bib")
[1] C. T. Butts. “network: a Package for Managing Relational Data in R.” In: Journal of Statistical Software 24.2 (2008). <URL: https://www.jstatsoft.org/v24/i02/paper>.
[2] C. T. Butts. network: Classes for Relational Data. R package version 1.16.1. 2020. <URL: http://statnet.org/>.
[3] C. T. Butts. sna: Tools for Social Network Analysis. R package version 2.6. 2020. <URL: http://statnet.org>.
[4] P. N. Krivitsky. statnet.common: Common R Scripts and Utilities Used by the Statnet Project Software. R package version 4.4.1. 2020. <URL: https://statnet.org>.
[5] E. Neuwirth. RColorBrewer: ColorBrewer Palettes. R package version 1.1-2. 2014. <URL: https://CRAN.R-project.org/package=RColorBrewer>.
[6] E. Paradis, S. Blomberg, B. Bolker, et al. ape: Analyses of Phylogenetics and Evolution. R package version 5.4-1. 2020. <URL: http://ape-package.ird.fr/>.
[7] E. Paradis and K. Schliep. “ape 5.0: an environment for modern phylogenetics and evolutionary analyses in R”. In: Bioinformatics 35 (2019), pp. 526-528.
[8] R Core Team. R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing. Vienna, Austria, 2020. <URL: https://www.R-project.org/>.
[9] B. Schloerke, D. Cook, J. Larmarange, et al. GGally: Extension to ggplot2. R package version 2.0.0. 2020. <URL: https://CRAN.R-project.org/package=GGally>.
[10] G. Stulp. ggplotgui: Create Ggplots via a Graphical User Interface. R package version 1.0.0. 2017. <URL: https://github.com/gertstulp/ggplotgui/>.
[11] H. Wickham. ggplot2: Elegant Graphics for Data Analysis. Springer-Verlag New York, 2016. ISBN: 978-3-319-24277-4. <URL: https://ggplot2.tidyverse.org>.
[12] H. Wickham, W. Chang, L. Henry, et al. ggplot2: Create Elegant Data Visualisations Using the Grammar of Graphics. R package version 3.3.2. 2020. <URL: https://CRAN.R-project.org/package=ggplot2>.
[13] H. Wickham and J. Hester. readr: Read Rectangular Text Data. R package version 1.4.0. 2020. <URL: https://CRAN.R-project.org/package=readr>.
Version information about R, the operating system (OS) and attached or R loaded packages. This appendix was generated using sessionInfo().
## R version 4.0.2 (2020-06-22)
## Platform: x86_64-apple-darwin17.0 (64-bit)
## Running under: macOS Catalina 10.15.5
##
## Matrix products: default
## BLAS: /Library/Frameworks/R.framework/Versions/4.0/Resources/lib/libRblas.dylib
## LAPACK: /Library/Frameworks/R.framework/Versions/4.0/Resources/lib/libRlapack.dylib
##
## locale:
## [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
##
## attached base packages:
## [1] stats graphics grDevices utils datasets methods base
##
## other attached packages:
## [1] knitcitations_1.0.10 RColorBrewer_1.1-2 sna_2.6
## [4] statnet.common_4.4.1 network_1.16.1 GGally_2.0.0
## [7] ggplotgui_1.0.0 ggplot2_3.3.2 readr_1.4.0
## [10] ape_5.4-1
##
## loaded via a namespace (and not attached):
## [1] Rcpp_1.0.5 lubridate_1.7.9 lattice_0.20-41 tidyr_1.1.2
## [5] assertthat_0.2.1 digest_0.6.25 mime_0.9 rle_0.9.2
## [9] R6_2.4.1 cellranger_1.1.0 plyr_1.8.6 evaluate_0.14
## [13] coda_0.19-4 httr_1.4.2 highr_0.8 pillar_1.4.6
## [17] rlang_0.4.8 lazyeval_0.2.2 readxl_1.3.1 rstudioapi_0.11
## [21] data.table_1.13.0 rmarkdown_2.4 labeling_0.3 RefManageR_1.2.12
## [25] stringr_1.4.0 htmlwidgets_1.5.2 munsell_0.5.0 shiny_1.5.0
## [29] compiler_4.0.2 httpuv_1.5.4 xfun_0.18 pkgconfig_2.0.3
## [33] htmltools_0.5.0 tidyselect_1.1.0 tibble_3.0.3 reshape_0.8.8
## [37] fansi_0.4.1 viridisLite_0.3.0 crayon_1.3.4 dplyr_1.0.2
## [41] withr_2.3.0 later_1.1.0.1 grid_4.0.2 nlme_3.1-149
## [45] jsonlite_1.7.1 xtable_1.8-4 gtable_0.3.0 lifecycle_0.2.0
## [49] magrittr_1.5 scales_1.1.1 bibtex_0.4.2.3 cli_2.0.2
## [53] stringi_1.5.3 farver_2.0.3 promises_1.1.1 xml2_1.3.2
## [57] ellipsis_0.3.1 generics_0.0.2 vctrs_0.3.4 tools_4.0.2
## [61] forcats_0.5.0 glue_1.4.2 purrr_0.3.4 hms_0.5.3
## [65] parallel_4.0.2 fastmap_1.0.1 yaml_2.2.1 colorspace_1.4-1
## [69] plotly_4.9.2.1 knitr_1.30 haven_2.3.1